Creating Publication-Quality Graphics

Questions

  • How can I create publication-quality graphics in R?

Objectives

  • To be able to use ggplot2 to generate publication quality graphics.

  • To understand the basic grammar of graphics, including the aesthetics and geometry layers, adding statistics, transforming scales, and coloring or panelling by groups.

Plotting the data is one of the best ways to quickly explore it and generate hypotheses about various relationships between variables.

There are several plotting systems in R, but today we will focus on ggplot2 which implements grammar of graphics - a coherent system for describing components that constitute visual representation of data. For more information regarding principles and thinking behind ggplot2 graphic system, please refer to Layered grammar of graphics by Hadley Wickham (@hadleywickham).

The advantage of ggplot2 is that it allows R users to create publication quality graphics with just a few lines of code. ggplot2 has a large user base and is constantly developed and extended by the community.

Getting started

ggplot2 is a core member of tidyverse family of packages. Installing and loading the package under the same name will load all of the packages we will need for this workshop. Lets get started!

# install.packages("tidyverse")
# install.packages("gapminder")
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.0     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(gapminder)

If above code produces an error “there is no package called ‘tidyverse’”, uncomment (remove #) the line above and run install.packages()command before you load the library. You only need to install the package once, but you will have to reload it, using the library() command, every time you restart R.

Today we will be working with the gapminder dataset, which is the excerpt from the GAPMINDER data. Once gapminder package is loaded, data is already available to you.

You can have a look at the content of the gapminder data frame by simply typing gapminder either in the R-chunk or in the console. Data frame is a rectangular collection of data, where variables are organized as columns and observations are listed as rows.

gapminder
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # … with 1,694 more rows

The dataset contains the following fields:

  • country: country name
  • continent: continent name
  • year: year of observation
  • lifeExp: life expectancy at birth
  • pop: total population
  • gdpPercap: per-capita GDP

More information about the package and the data is available in help. Just type ?gapminder in console, located in the bottom panel of your RStudio, or type gapminder in the search field of the Help tab of the bottom-right RStudio panel. Whenever you are unsure about anything in R, it is a good idea to check out the help file using one of the two methods described above.

Creating the first plot

Here’s a question that we would like to answer using gapminder data: Do people in rich countries live longer than people in poor countries? The answer may be quite intuitive, but we will continue our investigation further: how does the relationship between GDP per capita and Life expectancy look like? Is this relationship linear? Non-linear? Are there exceptions to the general rule (outliers)?

To plot gapminder, run the following code in the R-chunk or in console. The following code will put gdpPercap on the x-axis and lifeExp on the y-axis:

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = gdpPercap, y = lifeExp))

Note that we split the function into two lines. The “plus” sign indicates that the function is not over yet and that the next line should be interpreted as additional layer to the preceding ggplot() function. In other words, when writing a ggplot() function spanning several lines, the + sign goes at the end of the line, not in the beginning.

The plot shows positive non-linear relationship between GDP per capita and Life expectancy.

Does this graph confirm or disprove your initial hypothesis about the relationship between these variables?

Note that in order to create a plot using ggplot2 system, you should start your command with ggplot() function. It creates an empty coordinate system and initializes the dataset to be used in the graph (which is supplied as a first argument into the ggplot() function). In order to create graphical representation of the data, we can add one or more layers to our otherwise empty graph. Functions starting with the prefix geom_ create a visual representation of data. In this case we added scattered points, using geom_point() function. There are many geoms in ggplot2, some of which we will learn in this lesson.

geom_ function create mapping of variables from the earlier defined dataset to certain aesthetical elements of the graph, such as axis, shapes or colors. The first argument of any geom_ function expects the user to specify these mappings, wrapped in the aes() (short for aesthetics) function. In this case, we mapped gdpPercap and lifeExp variables from gapminder dataset to x and y-axis, respectively (using x and y arguments of aes() function).

Generally speaking, the template for visualizing data in ggplot2 can be summarized as follows:

`ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))`

In the remainer of this lesson we will learn how to extend and complete this template using different elements to produce various visualizations. First, we will look closer at the <MAPPINGS> component.

Challenge 1.

Assignment

  • How did Life expectancy change over time? What do you observe? Note that many points are plotted on top of each other. This is called “overplotting”. Try a different geom_ function called geom_jitter. It will spread the points apart a little bit using random noise.

Hint: the gapminder dataset has a column called year, which should appear on the x-axis.

  • See if you can visualize Life expectancy by continent. Which continent tends to have higher life expectancy (notice the density of the points along the y-axis)? Lowest life expectancy? Which continent has highest spread in life expectancy values? How about lowest spread?

Solution

## Part 1
ggplot(gapminder)+
  geom_point(mapping = aes(x=year, y=lifeExp))

# fix overplotting
ggplot(gapminder)+
  geom_jitter(mapping = aes(x=year, y=lifeExp))

## Part 2
ggplot(gapminder)+
  geom_jitter(mapping = aes(x=continent, y=lifeExp))

Aesthetic mappings

What if we want to combine graphs from the previous two challenges and show the relationship between three variables in the same graph? Turns out, we don’t necessarily need to use third geometrical dimension, we can simply employ color.

The following graph maps continent variable from gapminder dataset to the color aesthetic of the plot. Let’s take a look:

ggplot(data = gapminder) + 
  geom_jitter(mapping = aes(x = year, y = lifeExp, color=continent))

Challenge 2.

Assignment

  • What will happen if you switch the mappings of continent and year in the previous example? Is the graph still useful? Why? What if you map color aesthetic to country? What has changed? How is year different from country? What is the limitation of the color aesthetic, when used to visualize different types of data?

  • Can you add a little color to our initial graph of life expectancy by GDP per capita? Color the points by continent. There seem to be some outliers in this graph. Can you now spot which continent to these points belong to? How about using color gradient to illustrate change over time?

Hint: you may want to transform GDP per capita to logarithmic scale before plotting. Just wrap the name of the variable into the log() function

Solution

## Part 1
ggplot(data = gapminder) + 
  geom_jitter(mapping = aes(x = continent, y = lifeExp, color=year))

# Color by country
ggplot(data = gapminder) + 
  geom_jitter(mapping = aes(x = continent, y = lifeExp, color=country))  

## Part 2
ggplot(data = gapminder) + 
  geom_jitter(mapping = aes(x = log(gdpPercap), y = lifeExp, color=continent))

# change over time
ggplot(data = gapminder) + 
  geom_jitter(mapping = aes(x = log(gdpPercap), y = lifeExp, color=year))

More aesthetics

There are other aesthetics that can come handy. One of them is size. This aestetic will vary the size of datapoints to illustrate another continuous variable, such as country population. Lets look at four dimensions at once!

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = log(gdpPercap), y = lifeExp, color=continent, size=pop))

There’s one more useful aesthetic property of the graph which is good for visualizing low-cardinality categorical variables (categorical variables with small number of unique values), called shape. The idea is that you can employ different shapes (other than circles) to plot the data.

Challenge 3.

Assignment

  • Blow your mind by visualizing five(!) dimensions in the same graph. Modify the previous example mapping year to color and shape to continent. What can you say about those Asian outliers: do those belong to small or large countries? Are they from earlier or later time periods?

Solution

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = log(gdpPercap), y = lifeExp, color=year, shape=continent, size=pop))

Non-data linked properties

Combining too many aesthetics in the same graph can make it quite busy. However, you can always remove certain aesthetic properties and use several graphs to highlight different aspects of data.

Until now, we explored different aesthetic properties of a graph mapped to certain variables. What if you want to recolor or use a particular shape to plot all datapoints? Well, that means that such color or shape will no longer be mapped to any data, so you need to supply it to geom_ function as a separate argument (outside of the mapping). Here’s our initial graph with all colors colored in blue.

ggplot(data = gapminder) + 
  geom_point(mapping = aes(x = gdpPercap, y = lifeExp), alpha=0.1, size=2, color="blue")

Note: This plot utilizes alpha aestetic which varies “opacity” of datapoints from completely opaque (alpha=1) to completely transparent (alpha=0). Feel free to experiment with it, changing the transparency of the datapoints inside and outside the aesthetics. What can be the benefit of each one of these methods?

Once more, observe that in our example above, the color is not mapped to any particular variable from the gapminder dataset and applies equally to all datapoints, therefore it is outside the mapping argument and is not wrapped into aes() function. Note that unmapped colors are supplied as characters (in quotes), size is a number (size of point in mm) and shape is the ordinal index of the shape in R’s internal vocabulary (where square is 0, circle is 1, triangle is 2 and small filled circle is 20). Explore different shapes by varying the shape number between 0..25 or refer to ggplot2 documentation, called [vignettes] (http://docs.ggplot2.org/current/vignettes/ggplot2-specs.html), for details. This document can be also called from within R by calling vignette("ggplot2-specs").

Geometrical objects

Next, we will consider different options for component of our ggplot2 template. Using different geom_ functions user can highlight different aspects of data. For example, we could connect individual datapoints belonging to the same country into a line and illustrate the development of life expectancy over time for each country separately using geom_line() function.

Some geom_ functions require additional aesthetics, such as aesthetic group in the geom_line() function. This aesthetic may not have any meaning in other geoms, but here it allows us to draw multiple lines, one per country. To keep the lines organized, we will color them by continent.

ggplot(data = gapminder) + 
  geom_line(mapping = aes(x = year, y = lifeExp, group=country, color=continent))

Note how life expectancy suddenly drops for certain countries for a short period of time. We will learn how to zoom in to those tragic periods of history and investigate which countries experienced them later in this workshop.

Another useful geom function is geom_boxplot(). It adds a layer with the “box and whiskers” plot illustrating the distribution of values within categories. The following chart breaks down life expectancy by continent, where the box represents first and third quartile (the 25th and 75th percentiles), the middle bar signifies the median value and the whiskers extend to cover 95% confidence interval. Outliers (outside of the 95% confidence interval range) are shown separately.

ggplot(data = gapminder) + 
  geom_boxplot(mapping = aes(x = continent, y = lifeExp))

Layers can be added on top of each other. In the following graph we will place the boxplots over jittered points to see the distribution of outliers more clearly. We can map two aesthetic properties to the same variable. Here we will use different color for each continent.

ggplot(data = gapminder) + 
  geom_jitter(mapping = aes(x = continent, y = lifeExp, color=continent)) +
  geom_boxplot(mapping = aes(x = continent, y = lifeExp, color=continent))

Now, this was slightly inefficient due to duplication of code - we had to specify the same mappings for two layers. To avoid it, you can move common arguments of geom_ functions to the main ggplot() function. In this case every layer will “inherit” the same arguments, specified in the “parent” function.

ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp, color=continent)) + 
  geom_jitter() +
  geom_boxplot()

You can still add layer-specific mappings or other arguments by specifying them within individual geoms. We would recommend building each layer separately and then moving common arguments up to the “parent” function (“first explicity then implicit”).

We can use linear models to highlight differences in relationships of GDP per capita and life expectancy by continent. Notice that we added a separate argument to the geom_smooth() function to specify the type of model we want ggplot2 to built using the data (in this case, a linear model). The geom_smooth() function has also helpfully provided confidence intervals, indicating “goodness of fit” for each model (shaded gray area). For more information on statistical models, please refer to help (by typing ?geom_smooth)

ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp, color=continent)) +
  geom_point(alpha=0.5) +
  geom_smooth(method="lm")

Notice, that we also used a previously discussed visual property called alpha to increase transparency of the data points and make trend lines stand out. As you might remember, alpha property can also be used as a mapping aesthetic, i.e. transparency can be made to vary depending on the value of certain variable.

Challenge 4.

Assignment

  • Modify the graph above to force R to create single regression line for all data points. Keep the points colored by continent. Hint: There could be several alternative solutions to this problem

Solution

In the graph above, each geom inherited all three mappings: x, y and color. If we want only single linear model to be built, we would need to limit the effect of color aesthetic to only geom_point() function, by moving it from the “parent” function to the layer where we want it to apply. Note, though, that because we want the color to be still mapped to the continent variable, it needs to be wrapped into aes() function and supplied to mapping argument.

Alternative solution is just a “hack”, based on overriding the “inherited” color aestetic in the geom_smooth() layer. This solution works fine, but may be a little less easy to interpret what’s going on.

ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp)) +
  geom_point(mapping=aes(color=continent), alpha=0.5) +
  geom_smooth(method="lm")

# Alternative solution
ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp, color=continent)) +
  geom_point(alpha=0.5) +
  geom_smooth(method="lm", color="black")

Correcting the scale

As you can observe the x-axis label of our graph says log(gdpPercap), which indicates that we are not really plotting the original data, but rather the output of log() function. The same effect (with slightly more aesthetically pleasing x-axis label) can be achieved by specifying the x-axis scale transformation as a separate layer. Instead of transforming the values, we will transform the scale of x-axis.

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color=continent)) +
  geom_point() +
  geom_smooth() +
  scale_x_log10()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Now the x-axis is measured in log10 units and the data, plotted on log10 scale looks more linear. Certain scale and coordinate functions may result in similar visual effects on the chart, but the way they interact with other aesthetic elements may be quite different. Check out the online ggplot2 documentation for more details and examples of using scale and coordinate transformations.

Challenge 5.

Assignment

  • Make a boxplot of life expectancy by year. Hint: You may need to do something with the year variable to force it to be categorical, or follow the advice suggested by ggplot. When was interquartile range of life expectancy the smallest? Make the same plot of gdpPercap (on a log scale) per year. Compared to 1952, is the world today more or less diverse in terms of IQR of GDP per capita?

  • Make a histogram of untransformed and transformed gdpPercap? Note, histogram requires you to specify only one variable, mapped to x aestetic. What is the shape of the distribution? Why is bin parameter important for interpretation of the histogram?

  • Build a density function (also a univariate function). How would you compare density functions of different continents?

  • Based on graph produced using geom_density2d() function of log GDP per capita vs life expectancy, how many clusters of datapoints can you identify? What if you look at it by continent?

Solution

## Part 1
# force year to become categorical
ggplot(gapminder)+
  geom_boxplot(mapping = aes(y=lifeExp, x=as.character(year))) # simple x=year will not work

# ggplot suggested solution
ggplot(gapminder)+
  geom_boxplot(mapping = aes(y=lifeExp, x=year, group=year))

# gdpPercap
ggplot(gapminder)+
  geom_boxplot(mapping = aes(y=gdpPercap, x=year, group=year))+
  scale_y_log10()

## Part 2
ggplot(gapminder)+
  geom_histogram(mapping = aes(x=gdpPercap)) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# on log scale with higher number of bins
ggplot(gapminder)+
  geom_histogram(mapping = aes(x=gdpPercap),bins=100) +
  scale_x_log10()

## Part 3
# density
ggplot(gapminder)+
  geom_density(mapping = aes(x=gdpPercap)) +
  scale_x_log10()

# by continent
ggplot(gapminder)+
  geom_density(mapping = aes(x=gdpPercap, color=continent)) +
  scale_x_log10()

## Part 4
# Density 2d
ggplot(gapminder)+
  geom_density2d(mapping = aes(x=gdpPercap, y=lifeExp)) +
  scale_x_log10()

# by continent
ggplot(gapminder)+
  geom_density2d(mapping = aes(x=gdpPercap, y=lifeExp, color=continent)) +
  scale_x_log10()

Faceting

Multi-layered graphs employing several aesthetics can look crowded. In order to avoid it, one can split the data into different graphs using panels of similar graphs. In ggplot2 this method is called “faceting”. Lets facet the graph above by continent and show the datapoints and the trend for each continent in a separate chart.

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_smooth() +
  scale_x_log10() + 
  facet_wrap( vars(continent))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The facet_wrap() layer takes a vector of variables wrapped into the vars() function to specify that these should be interpreted in the context of the data. This tells R to draw a panel for each unique value in the continent column of the gapminder dataset. Faceting is useful when number of panels is limited. Notice that here R places panels from left to right, “wrapping” those panels that do not fit in one row onto the new line. Learn about advanced faceting, including faceting over several variables using help on ?facet_grid().

Note: In the code belonging to the older version of ggplot2 you may come across on the web, you will see “one-sided formula” specified inside facet-wrap(). Don’t panic, facet_wrap(~continent) is a perfectly valid code that still works in the modern version of the package.

Reiterating our previously proposed ggplot2 template and adding what we learned until, now we can state:

`ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) + 
  <FACET_FUNCTION>`

Challenge 6.

Assignment

  • Try faceting by year, keeping the linear smoother. Is there any change in slope of the linear trend over the years? What if you look at linear models per continent?

Solution

# by year
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_x_log10() + 
  facet_wrap( vars(year))

# by continent
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_x_log10() + 
  facet_wrap( vars(continent))

Coordinate systems

Sometimes when plotting categorical variable on x-axis, bars end up too narrow and labels look unreadable. One way of dealing with it is to flip the coordinate system, i.e. plot the same data as horizontal bars. Let’s try to show population of every Asian country in 2007.

Note: this example requires filter() function, which we have not yet studied. Hang on, it is coming at you very soon!

ggplot(filter(gapminder, year==2007, continent=="Asia")) + 
  geom_bar(mapping = aes(x=country, y=pop), stat="identity") +
  coord_flip()

There are many function related to coordinate systems that allow, among other things, plotting in non-cartesian (e.g. polar and Mercator) coordinates and specifying manual limits for coordinate axis.

Labeling the chart

Lastly we will learn how to label and annotate the chart using labs and annotate functions.

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) +
  geom_point() +
  scale_x_log10() +
  facet_wrap(vars(continent)) +
  # Here comes the gganimate specific bits
  labs(title="Life Expectancy vs GDP per capita over time",
       subtitle="In the past 50 years, life expectancy has improved in most countries of the world",
       caption="Source: Gapminder foundation, https://www.gapminder.org/data/",
       x="GDP per capita, '000 USD",
       y="Life expectancy, years",
       color="Continent",
       size="Population, mln")

BONUS: Animation

The graph produced in the previous section looks quite good, but it requires a reader to follow the time aspect of the data by tracing the changes across panels. This may be better illustrated by “animating” the time dimension of the data and playing the twelve charts in front of user one after another.

# install.packages("gganimate")
# install.packages("gifski")

library(gganimate)

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) +
  geom_point() +
  scale_x_log10() +
  facet_wrap(vars(continent)) +
  # Here comes the gganimate specific bits
  labs(title="Life Expectancy vs GDP per capita in {frame_time}",
       subtitle="In the past 50 years, life expectancy has improved in most countries of the world",
       caption="Source: Gapminder foundation, https://www.gapminder.org/data/",
       x="GDP per capita, '000 USD",
       y="Life expectancy, years",
       color="Continent",
       size="Population, mln") +
  transition_time(year) +
  ease_aes('linear')

Wrap-up

We conclude this lesson by reiterating our ggplot2 data visualization template.

`ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>),
                  stat = <STAT>) +
  <SCALE_FUNCTION> +
  <COORDINATE_FUNCTION> +
  <FACET_FUNCTION> + 
  <LABS>`

We learned about seven parameters of ggplot functions. However, it is very rare that all six of them need to specified in a given graphic or chart. Most of the time ggplot offers useful defaults for everything other than data, geoms and mappings.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

Still bored?

  • Use several graphs and necessary filters to narrow down your search to those few outliers with high gdpPercap. What are those countries and in which years? What might be the reason?

Hint: You may want to experiment with geom_text() to get the country labels to show on the chart

  • Use several graphs and necessary filters to narrow down your search to those few outliers with extraordinarily low life expectancy. What are those countries and in which years? What might be the reason?

Plotting maps

When you are working on data from different countries, it might also be an idea to actually use maps to convey your data in a familiar way. ggplot2 has a new geom called geom_sf what will help you plot maps and use aethetics in the same way as in other geoms.

We have downloaded world data file from thematicmappin.org, called a shapefile and will use this to create maps. In this case, we use the entire folder that was downloaded as a source, and a package in R called sf know how to read this as a map coordinate system.

library(sf)
## Linking to GEOS 3.7.1, GDAL 2.4.2, PROJ 5.2.0
## WARNING: different compile-time and runtime versions for GEOS found:
## Linked against: 3.7.1-CAPI-1.11.1 27a5e771 compiled against: 3.7.0-CAPI-1.11.0
## It is probably a good idea to reinstall sf, and maybe rgeos and rgdal too
# install.packages("rnaturalearth")
#
# try plotting the world map
# world <- rnaturalearth::ne_countries(returnclass = "sf")
# ggplot() +
#      geom_sf(data =  world) +
#      theme_bw()

world_map <- rnaturalearth::ne_countries(returnclass = "sf")
## Warning in fun(libname, pkgname): rgeos: versions of GEOS runtime 3.7.1-CAPI-1.11.1
## and GEOS at installation 3.7.0-CAPI-1.11.0differ
world_map
## Simple feature collection with 177 features and 63 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##   scalerank      featurecla labelrank           sovereignt sov_a3 adm0_dif
## 0         1 Admin-0 country         3          Afghanistan    AFG        0
## 1         1 Admin-0 country         3               Angola    AGO        0
## 2         1 Admin-0 country         6              Albania    ALB        0
## 3         1 Admin-0 country         4 United Arab Emirates    ARE        0
## 4         1 Admin-0 country         2            Argentina    ARG        0
## 5         1 Admin-0 country         6              Armenia    ARM        0
## 6         1 Admin-0 country         4           Antarctica    ATA        0
## 7         3 Admin-0 country         6               France    FR1        1
## 8         1 Admin-0 country         2            Australia    AU1        1
## 9         1 Admin-0 country         4              Austria    AUT        0
##   level              type                               admin adm0_a3
## 0     2 Sovereign country                         Afghanistan     AFG
## 1     2 Sovereign country                              Angola     AGO
## 2     2 Sovereign country                             Albania     ALB
## 3     2 Sovereign country                United Arab Emirates     ARE
## 4     2 Sovereign country                           Argentina     ARG
## 5     2 Sovereign country                             Armenia     ARM
## 6     2     Indeterminate                          Antarctica     ATA
## 7     2        Dependency French Southern and Antarctic Lands     ATF
## 8     2           Country                           Australia     AUS
## 9     2 Sovereign country                             Austria     AUT
##   geou_dif                             geounit gu_a3 su_dif
## 0        0                         Afghanistan   AFG      0
## 1        0                              Angola   AGO      0
## 2        0                             Albania   ALB      0
## 3        0                United Arab Emirates   ARE      0
## 4        0                           Argentina   ARG      0
## 5        0                             Armenia   ARM      0
## 6        0                          Antarctica   ATA      0
## 7        0 French Southern and Antarctic Lands   ATF      0
## 8        0                           Australia   AUS      0
## 9        0                             Austria   AUT      0
##                               subunit su_a3 brk_diff
## 0                         Afghanistan   AFG        0
## 1                              Angola   AGO        0
## 2                             Albania   ALB        0
## 3                United Arab Emirates   ARE        0
## 4                           Argentina   ARG        0
## 5                             Armenia   ARM        0
## 6                          Antarctica   ATA        0
## 7 French Southern and Antarctic Lands   ATF        0
## 8                           Australia   AUS        0
## 9                             Austria   AUT        0
##                     name                           name_long brk_a3
## 0            Afghanistan                         Afghanistan    AFG
## 1                 Angola                              Angola    AGO
## 2                Albania                             Albania    ALB
## 3   United Arab Emirates                United Arab Emirates    ARE
## 4              Argentina                           Argentina    ARG
## 5                Armenia                             Armenia    ARM
## 6             Antarctica                          Antarctica    ATA
## 7 Fr. S. Antarctic Lands French Southern and Antarctic Lands    ATF
## 8              Australia                           Australia    AUS
## 9                Austria                             Austria    AUT
##                     brk_name brk_group     abbrev postal
## 0                Afghanistan      <NA>       Afg.     AF
## 1                     Angola      <NA>       Ang.     AO
## 2                    Albania      <NA>       Alb.     AL
## 3       United Arab Emirates      <NA>     U.A.E.     AE
## 4                  Argentina      <NA>       Arg.     AR
## 5                    Armenia      <NA>       Arm.    ARM
## 6                 Antarctica      <NA>       Ant.     AQ
## 7 Fr. S. and Antarctic Lands      <NA> Fr. S.A.L.     TF
## 8                  Australia      <NA>       Auz.     AU
## 9                    Austria      <NA>      Aust.      A
##                                              formal_en formal_fr note_adm0
## 0                         Islamic State of Afghanistan      <NA>      <NA>
## 1                          People's Republic of Angola      <NA>      <NA>
## 2                                  Republic of Albania      <NA>      <NA>
## 3                                 United Arab Emirates      <NA>      <NA>
## 4                                   Argentine Republic      <NA>      <NA>
## 5                                  Republic of Armenia      <NA>      <NA>
## 6                                                 <NA>      <NA>      <NA>
## 7 Territory of the French Southern and Antarctic Lands      <NA>       Fr.
## 8                            Commonwealth of Australia      <NA>      <NA>
## 9                                  Republic of Austria      <NA>      <NA>
##                           note_brk                           name_sort
## 0                             <NA>                         Afghanistan
## 1                             <NA>                              Angola
## 2                             <NA>                             Albania
## 3                             <NA>                United Arab Emirates
## 4                             <NA>                           Argentina
## 5                             <NA>                             Armenia
## 6 Multiple claims held in abeyance                          Antarctica
## 7                             <NA> French Southern and Antarctic Lands
## 8                             <NA>                           Australia
## 9                             <NA>                             Austria
##   name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13  pop_est gdp_md_est
## 0     <NA>         5         6         8          7 28400000    22270.0
## 1     <NA>         3         2         6          1 12799293   110300.0
## 2     <NA>         1         4         1          6  3639453    21810.0
## 3     <NA>         2         1         3          3  4798491   184300.0
## 4     <NA>         3         1         3         13 40913584   573900.0
## 5     <NA>         3         1         2         10  2967004    18770.0
## 6     <NA>         4         5         1         NA     3802      760.4
## 7     <NA>         7         5         9         11      140       16.0
## 8     <NA>         1         2         2          7 21262641   800200.0
## 9     <NA>         3         1         3          4  8210281   329500.0
##   pop_year lastcensus gdp_year                    economy
## 0       NA       1979       NA  7. Least developed region
## 1       NA       1970       NA  7. Least developed region
## 2       NA       2001       NA       6. Developing region
## 3       NA       2010       NA       6. Developing region
## 4       NA       2010       NA    5. Emerging region: G20
## 5       NA       2001       NA       6. Developing region
## 6       NA         NA       NA       6. Developing region
## 7       NA         NA       NA       6. Developing region
## 8       NA       2006       NA 2. Developed region: nonG7
## 9       NA       2011       NA 2. Developed region: nonG7
##                income_grp wikipedia fips_10 iso_a2 iso_a3 iso_n3 un_a3
## 0           5. Low income        NA    <NA>     AF    AFG    004   004
## 1  3. Upper middle income        NA    <NA>     AO    AGO    024   024
## 2  4. Lower middle income        NA    <NA>     AL    ALB    008   008
## 3 2. High income: nonOECD        NA    <NA>     AE    ARE    784   784
## 4  3. Upper middle income        NA    <NA>     AR    ARG    032   032
## 5  4. Lower middle income        NA    <NA>     AM    ARM    051   051
## 6 2. High income: nonOECD        NA    <NA>     AQ    ATA    010  <NA>
## 7 2. High income: nonOECD        NA    <NA>     TF    ATF    260  <NA>
## 8    1. High income: OECD        NA    <NA>     AU    AUS    036   036
## 9    1. High income: OECD        NA    <NA>     AT    AUT    040   040
##   wb_a2 wb_a3 woe_id adm0_a3_is adm0_a3_us adm0_a3_un adm0_a3_wb
## 0    AF   AFG     NA        AFG        AFG         NA         NA
## 1    AO   AGO     NA        AGO        AGO         NA         NA
## 2    AL   ALB     NA        ALB        ALB         NA         NA
## 3    AE   ARE     NA        ARE        ARE         NA         NA
## 4    AR   ARG     NA        ARG        ARG         NA         NA
## 5    AM   ARM     NA        ARM        ARM         NA         NA
## 6  <NA>  <NA>     NA        ATA        ATA         NA         NA
## 7  <NA>  <NA>     NA        ATF        ATF         NA         NA
## 8    AU   AUS     NA        AUS        AUS         NA         NA
## 9    AT   AUT     NA        AUT        AUT         NA         NA
##                 continent               region_un
## 0                    Asia                    Asia
## 1                  Africa                  Africa
## 2                  Europe                  Europe
## 3                    Asia                    Asia
## 4           South America                Americas
## 5                    Asia                    Asia
## 6              Antarctica              Antarctica
## 7 Seven seas (open ocean) Seven seas (open ocean)
## 8                 Oceania                 Oceania
## 9                  Europe                  Europe
##                   subregion                  region_wb name_len long_len
## 0             Southern Asia                 South Asia       11       11
## 1             Middle Africa         Sub-Saharan Africa        6        6
## 2           Southern Europe      Europe & Central Asia        7        7
## 3              Western Asia Middle East & North Africa       20       20
## 4             South America  Latin America & Caribbean        9        9
## 5              Western Asia      Europe & Central Asia        7        7
## 6                Antarctica                 Antarctica       10       10
## 7   Seven seas (open ocean)         Sub-Saharan Africa       22       35
## 8 Australia and New Zealand        East Asia & Pacific        9        9
## 9            Western Europe      Europe & Central Asia        7        7
##   abbrev_len tiny homepart                       geometry
## 0          4   NA        1 MULTIPOLYGON (((61.21082 35...
## 1          4   NA        1 MULTIPOLYGON (((16.32653 -5...
## 2          4   NA        1 MULTIPOLYGON (((20.59025 41...
## 3          6   NA        1 MULTIPOLYGON (((51.57952 24...
## 4          4   NA        1 MULTIPOLYGON (((-65.5 -55.2...
## 5          4   NA        1 MULTIPOLYGON (((43.58275 41...
## 6          4   NA        1 MULTIPOLYGON (((-59.57209 -...
## 7         10    2       NA MULTIPOLYGON (((68.935 -48....
## 8          4   NA        1 MULTIPOLYGON (((145.398 -40...
## 9          5   NA        1 MULTIPOLYGON (((16.97967 48...
ggplot(world_map) +
  geom_sf(aes(fill = pop_est))+
  scale_fill_viridis_c()+
  coord_sf()+
  theme_void()